In this report, a principal components analysis (PCA) is conducted on various environmental and climatic variables for countries worldwide. PCA is an ordination method that simplifies understanding and analysis of multivariate relationships by approximating original multidimensional space and reducing it to a two-dimensional visualization called a biplot. The data was compiled and provided by Zander Venter on Kaggle and acquired from publicly available remote sensing datasets uploaded to Google Earth Engine. The variables were calculated by taking the mean value for each country at a reduction scale of about 10km. The resulting biplot is utilized to determine correlations between variables.
# Read in .csv file and clean up column names to lower snake case
world_envi <- read_csv(here("data", "world_env_vars.csv")) %>%
clean_names()
# PCA Data Wrangling
world_pca <- world_envi %>%
select(accessibility_to_cities:cloudiness) %>%
select(-c("aspect", ends_with("_quart"))) %>% # deselect aspect variable and variables ending in "_quart"
drop_na() %>% # drop observations with an NA value
scale() %>% # scale values
prcomp() # makes data frame into a list of info for PCA
# Create a dataset that drops NAs and contains all variables to use for adding aesthetics to biplot
world_complete <- world_envi %>%
drop_na()
# See the loadings (weighting for each principal component)
world_pca$rotation
## PC1 PC2 PC3 PC4
## accessibility_to_cities -0.01208814 0.080994313 0.155465417 0.63242853
## elevation 0.12170299 0.065453461 -0.615470039 0.16161011
## slope 0.00864602 0.217016648 -0.481304296 0.19220913
## cropland_cover 0.13351093 0.155728850 0.226416990 -0.45184318
## tree_canopy_cover -0.29269549 0.205747138 -0.093500133 -0.07559820
## isothermality -0.32878964 -0.155887442 -0.085683170 0.12450410
## rain_driest_month -0.21438961 0.286285974 0.085438103 0.02228879
## rain_mean_annual -0.34576890 0.145629407 -0.093835621 -0.06027416
## rain_seasonailty 0.02788678 -0.387636099 -0.173088940 0.08217680
## rain_wettest_month -0.32609753 0.030137008 -0.168669406 -0.08849432
## temp_annual_range 0.35312644 -0.009461463 -0.088583074 -0.10523824
## temp_diurnal_range 0.14399954 -0.364057154 -0.217151433 0.02517740
## temp_max_warmest_month -0.06144520 -0.430660760 0.053850184 -0.12001497
## temp_mean_annual -0.25427246 -0.342709756 0.071569639 -0.02314108
## temp_min_coldest_month -0.32182415 -0.230103452 0.101955929 0.01952353
## temp_seasonality 0.34155662 0.115176304 -0.009944495 -0.11240990
## wind 0.12845176 0.062683168 0.387239201 0.49558356
## cloudiness -0.22391303 0.301909855 -0.006242412 -0.08724119
## PC5 PC6 PC7 PC8
## accessibility_to_cities -0.20940502 -0.34696009 -0.116565237 -0.57668122
## elevation 0.15587800 -0.03043558 -0.288499379 0.05179766
## slope 0.33730599 0.31351559 0.369138172 -0.29290821
## cropland_cover 0.46601010 -0.17997139 -0.067095308 -0.56643051
## tree_canopy_cover -0.34281384 0.02064494 0.110487281 -0.03275171
## isothermality 0.05708578 0.02813448 -0.346186167 0.01322909
## rain_driest_month -0.21655722 0.49611096 -0.100839537 -0.18052830
## rain_mean_annual -0.14604622 -0.03146021 0.193989201 -0.07688583
## rain_seasonailty 0.09581165 -0.35189766 0.380832253 0.05562498
## rain_wettest_month -0.08015437 -0.31903191 0.387506144 -0.06479596
## temp_annual_range -0.36239474 -0.02715552 0.101732215 -0.10087448
## temp_diurnal_range -0.18678901 0.01382341 -0.317599982 -0.18875569
## temp_max_warmest_month -0.20075996 0.17467682 0.133857258 -0.24036604
## temp_mean_annual 0.04669893 0.11643201 0.065112460 -0.11648158
## temp_min_coldest_month 0.18458683 0.11859813 -0.009021261 -0.05049371
## temp_seasonality -0.33353806 0.02727738 0.213553448 -0.04367819
## wind 0.20266648 0.10162202 0.263856054 0.23502191
## cloudiness -0.03464879 -0.45158633 -0.196687259 0.18502412
## PC9 PC10 PC11 PC12
## accessibility_to_cities 0.19599603 0.057678326 0.06494913 -0.003068228
## elevation -0.21334184 0.173273287 0.26576447 -0.531465086
## slope 0.19783507 0.060187373 -0.21181922 0.207785643
## cropland_cover -0.25896624 -0.117488809 0.11599561 -0.126686911
## tree_canopy_cover 0.11130361 -0.543974450 0.40102411 -0.288396509
## isothermality -0.13693475 -0.114643300 -0.07425794 0.179703108
## rain_driest_month -0.45716341 0.243048393 0.30215832 0.303345580
## rain_mean_annual -0.22985034 0.007408206 -0.22960692 -0.080573616
## rain_seasonailty -0.25191662 0.037103442 0.53951155 0.391377681
## rain_wettest_month -0.23353667 -0.083398594 -0.32794185 -0.101457133
## temp_annual_range -0.13243023 0.102490747 -0.09108661 -0.031914306
## temp_diurnal_range -0.36519764 -0.313152210 -0.34265751 0.134036791
## temp_max_warmest_month 0.02320349 0.366077948 -0.03954127 -0.300903434
## temp_mean_annual 0.04542687 0.182436258 0.05894965 -0.213864719
## temp_min_coldest_month 0.12077963 0.118595950 0.05242488 -0.140144463
## temp_seasonality -0.01247159 0.180868925 0.03952129 -0.073261090
## wind -0.47319066 -0.087968891 -0.11118473 -0.311500422
## cloudiness -0.08207496 0.492271218 -0.10214937 0.059657443
## PC13 PC14 PC15 PC16
## accessibility_to_cities -0.112190521 1.011526e-03 -0.011256001 -0.0059854686
## elevation -0.186584597 2.718716e-02 0.012178805 0.0120868353
## slope 0.342044595 -4.069031e-02 0.005919763 -0.0160242983
## cropland_cover 0.093224360 9.011536e-02 -0.020118348 -0.0005612962
## tree_canopy_cover 0.403522931 -9.842885e-02 0.024973810 0.0293942978
## isothermality 0.248884846 7.559277e-01 0.122218299 0.0036921553
## rain_driest_month -0.150021482 -1.415074e-01 0.185432289 0.0068543102
## rain_mean_annual -0.218574491 1.307717e-01 -0.774763729 0.0268019872
## rain_seasonailty 0.060963991 -1.094782e-05 -0.127390211 0.0408172471
## rain_wettest_month -0.297371974 5.599980e-03 0.564905357 -0.0096530169
## temp_annual_range 0.133289305 1.361336e-01 0.062085348 0.1587513455
## temp_diurnal_range 0.194998403 -3.838778e-01 -0.081833124 -0.1330264098
## temp_max_warmest_month 0.214150417 3.662095e-02 0.035468859 0.4633864044
## temp_mean_annual 0.055354205 -9.702331e-02 0.019596671 -0.7292994627
## temp_min_coldest_month 0.009590249 -9.076295e-02 -0.031029878 0.1264635379
## temp_seasonality 0.103248188 3.692409e-01 -0.021755957 -0.4348084151
## wind 0.257613126 6.870954e-03 0.003747155 0.0159750403
## cloudiness 0.507819087 -2.234689e-01 -0.009505227 -0.0367356578
## PC17 PC18
## accessibility_to_cities -0.007662261 1.984271e-11
## elevation -0.019293334 6.529019e-10
## slope 0.018967302 -8.151991e-10
## cropland_cover 0.002778203 -4.336500e-10
## tree_canopy_cover 0.004019026 -5.220946e-10
## isothermality 0.051579364 -1.677881e-09
## rain_driest_month -0.012115021 1.529479e-09
## rain_mean_annual 0.054631754 -1.298559e-09
## rain_seasonailty -0.020244153 -4.458399e-10
## rain_wettest_month -0.083061048 1.598033e-09
## temp_annual_range 0.513346442 5.809137e-01
## temp_diurnal_range -0.225311648 1.763533e-09
## temp_max_warmest_month -0.047692221 -3.934759e-01
## temp_mean_annual 0.386515085 -5.566786e-10
## temp_min_coldest_month -0.444852032 7.125419e-01
## temp_seasonality -0.567014187 2.681082e-09
## wind 0.005847516 -2.975414e-10
## cloudiness -0.026337262 2.354167e-10
# Create a PCA biplot using `autoplot()` function (from ggfortify package)
world_biplot <- autoplot(world_pca,
data = world_complete,
colour = 'country',
loadings = TRUE,
loadings.label = TRUE, # shows loadings of each variable
loadings.colour = "gray50",
loadings.label.colour = "black",
loadings.label.vjust = -0.75) +
theme_minimal() +
theme(legend.position = "none")
# Make the graph interactive
ggplotly(world_biplot)
Figure 1. Biplot results for the PCA performed on various environmental and climatic variables (indicated by labeled arrows). The length of the line/arrow indicates variance in each principal component (PC1 and PC2) direction, with longer lengths indicating larger variance and the angle between the arrows indicating correlation. The points on the interactive biplot represent each country included in the study (hover over each point to see country name), and the closer the points are to each other the more similar the countries are overall in multivariate space. Data: compiled and provided by Zander Venter on Kaggle and acquired through Google Earth Engine.
Variables that are highly positively correlated have a 0 degree angle between their arrows, such as for:
cloudiness and driest month
elevation and wind
Variables that are highly negatively correlated have a 180 degree angle between their arrows, such as for:
wettest month (most rain) and annual temperature range
isothermality and temperature seasonality
Variables that are not very correlated have an angle between their arrows close to 90 or 270 degrees, such as for:
slope and annual temperature range
temperature seasonality and rain seasonality
cropland cover and diurnal temperature range
tree canopy cover and warmest month
The closer the countries (points) on the biplot are to each other, the more similar they are in regard to all environmental and climatic variables.